KMID : 1155220220470030217
|
|
Journal of the Korean Society of Health Information and Health Statistics 2022 Volume.47 No. 3 p.217 ~ p.221
|
|
Performance Comparison of Imputation Methods Using Machine Learning Techniques for Ordinal Missing Data
|
|
Son Se-Rhim
An Hyung-Gin
|
|
Abstract
|
|
|
Objectives: When missing values occur, complete case analysis can cause biased results. In this paper, we discuss imputation methods using machine learning techniques when missing values occurred in ordinal variables.
Methods: We consider two machine learning techniques, the ordinal decision tree and the random forest, for the imputation of missing values. We use the ordinal decision tree treating variables as ordinal, and the random forest as nominal. In addition, we apply the cumulative logistic model. The results are compared with complete case analysis using empirical bias, empirical mean squared error and accuracy. The same methods are applied using the Korea National Health and Nutrition Examination Survey.
Results: In the case of five ordinal categories, machine learning techniques yield better performance than the cumulative logistic and complete case. The ordinal decision shows lower bias while random forest shows higher accuracy. In the case of 3 categories, random forest produces better performance in all respects. In the case study, biased results are also identified if we use complete case analysis. Random forest shows the best performance, and the parametric method shows similar performance to the ordinal decision tree.
Conclusions: Missing imputation using machine learning techniques can reduce bias and improve per- formance. If possible, it is recommended to use the ordinal decision tree to impute missing values that reflects the meaning of order. If it is not possible, it is recommended to treat them at least as nominal variables and then impute.
|
|
KEYWORD
|
|
Machine learning, Regression analysis, Decision tree, Big data, Health survey
|
|
FullTexts / Linksout information
|
|
|
|
Listed journal information
|
|
|
|